Knowledge Discovery and Analysis in Manufacturing
نویسندگان
چکیده
The quality and reliability requirements for next generation manufacturing are reviewed, and current approaches are cited. The potential for augmenting current quality/reliability technology is described, and characteristics of potential future directions are postulated. Methods based on knowledge discovery and analysis in manufacturing (KDAM) are reviewed, and related successful applications in business and social fields are discussed. Typical KDAM applications are noted, along with general functions and specific KDAM-related technologies. A systematic knowledge discovery process model is reviewed, and examples of current work are given, including description of successful applications of KDAM to creation of rules for optimizing gas porosity in sand casting molds. Finally, directions in KDAM technology and associated research requirements are described, and comments related to application and acceptance of KDAM are provided. Introduction Industries across the globe are pursuing “next generation manufacturing” (NGM) as a tactic for meeting rapidly-expanding global needs for high performance, low cost, high quality products and processes, and as a strategy for revitalizing companies and industries which have become non-competitive over time [1,2]. Within the pursuit of NGM lies a specific question: How will manufacturers achieve “next generation quality and reliability” for new products and processes? Demanding global customers simply will not settle for current best-inclass performance. About fifty years ago, the quality and reliability of products began to exhibit rapid improvements. Commonly traced to the seminal work of Shewhart [3], fundamentally different approaches to quality and reliability were initiated, primarily in Japan after WW II. Referred to by terms such as total quality control [4] and total quality management [5], wide-scale application of these approaches in the U.S. occurred the 1980’s, initially in semiconductor fabrication. These approaches now permeate manufacturing across the globe, with the effects most obvious to the public in areas such as increased automobile quality and reliability. Only an owner of an automobile manufactured in the 1970’s can fully appreciate the profound impact of this quality and reliability revolution. The approaches that have enabled these improvements can be characterized by two fundamental concepts. The first is that quality is best achieved by controlling inputs (processes) vs. inspecting outputs (products). In addition to improving the quality of products, this approach minimizes shipment of the 1015% defective products typically not caught by inspecting product [6]. The second fundamental concept is that the focus of output quality/reliability improvement efforts should be on minimizing variations in inputs through statistical analysis of samples of input data. Thus was born statistical process control (SPC) with its nowubiquitous X-bar and R charts [7], and related statisticsbased approaches now commonly applied to manufacturing, such as analysis of variance (ANOVA) [8], Taguchi methods [9], design for six sigma (DFSS) [10], and design of experiments (DOE) [11]. Current Situation If we think of the set of statistics-based quality and reliability tools as a technology, we can assume that this technology follows a classic technology s-curve [12], where incremental improvements accumulate slowly over time as the technology is introduced, then increase rapidly with technology improvements and widespread application, and finally level off as the full potential of the technology is realized. We can further assume that the general pattern of technology s-curve progression [13,14] also applies here, i.e., when an incumbent technology’s contribution to improvement (value) levels off, the technology is ripe for augmentation or replacement by a new technology, as illustrated in Figure 1. If this situation holds for quality and reliability technology, we can reasonably ask: Has the statisticsbased quality/reliability paradigm reached maturity, and if so, what technology will follow the s-curve progression 1 ? 1 When considering technology s-curves, it is essential to note that it is not necessarily the level of contribution of an incumbent technology that declines over time, it is the return-oninvestment in technology improvements that declines. The question here is: Which technologies should I invest in to reach the next level of product and service quality and reliability? Assuming that the technology s-curve model holds here, we can begin to answer the question of technology succession in quality and reliability by postulating three characteristics of next generation quality/reliability technology. Figure 1: Technology s-curve progression Generally, increasingly powerful and available computer and communication capacity is generating an ever-expanding sea of data. The manufacturing environment mimics this general trend, with the content of manufacturing-related databases potentially extending far beyond the information scope of current statisticsbased quality and reliability approaches to include areas such as warrantee data, sales and marketing information, financial data, etc. These databases typically consist of many unrelated sets of data aggregated by many different entities within and outside of an organization, with each database geared toward supporting different organizational functions. We can predict that next generation quality/reliability technology will be based on integration of these massive extended databases. A common refrain heard from those directly involved in making next generation manufacturing a reality is that they are drowning in this rising sea of data. Traditional analysis approaches to identifying underlying patterns and structures in data are producing diminishing returns relative to the growth in available data (the tail of the technology s-curve). We can therefore assume that next generation quality/reliability technology will include finding useful patterns and structures in data currently unperceivable using common statistical approaches. The task of identifying useful data patterns and structures in massive extended databases implies a third characteristic of next generation quality and reliability technology, which is the ability to effectively utilize highly coherent, noisy, and corrupted data with missing field and record entries. This is a natural consequence of using data obtained from a variety of internal and external sources collected for purposes other than improving product designs and manufacturing processes. Knowledge Discovery and Analysis Given these characteristics, the next logical question is: Does such a technology currently exist? The answer is (of course): Yes. The set of tools and techniques grouped under terms such as data mining, machine learning, and knowledge discovery in data (KDD) [15-17] are designed specifically to provide the functions basic to next generation quality and reliability requirements, i.e. integration of massive extended databases to find useful but currently unperceivable patterns and structures in noisy data. Here, we will refer to the application of these and related technologies in manufacturing as KDAM knowledge discovery and analysis in manufacturing. Currently, research and application of these technologies occurs most commonly in social and business fields, and is most apparent to the public in sales and marketing applications. A commonly-encountered example is the amazon.com book recommendation system. When a customer logs in at amazon.com, a personalized home page appears recommending a number of books of potential interest to the customer categorized in several different ways, e.g., books similar to books that the customer has purchased, other books by authors of books the customer has purchased, books that other customers have purchased in addition to the books that the customer has purchased, etc. These recommendations are based on sophisticated data mining technologies applied to customer transaction data. Netflix, the world’s largest on-line movie rental service, provides similar recommendations, but includes an on-line customer survey of preferences designed to increase the hit-rate of movies selected from the recommended titles 2 . Nielsen Claritas (www.claritas.com) uses the technologies referenced here to provide a consumer segmentation system that combines demographic, consumer behavior, and geographic data to help marketers identify, understand and target their customers and 2 Recently, Netflix has offered the Netflix Prize (www.netflixprize.com), a $1 million award to any person or organization that produces a movie recommendation algorithm ten percent better than the existing Netflix algorithm [18]. An internet search on “netflix dataset” will provide the reader with interesting insights into this application of data mining and machine learning, a snapshot of how the Netflix Prize competitors are doing, and links to the actual Netflix dataset. prospects with customized products and communications. For example, the Claritas PRIZM NE product classifies households in terms of 66 demographically and behaviorally distinct types, which are further segmented into social groups and “LifeStage” groups. These types and groups can be linked to specific geographical areas (zip codes), which can, for example, assist in identifying likely new store locations. Amazon and Netflix provide examples of applications targeted at the personal level. Claritas provides an example of an application at the group social segmentation level. The SPSS Clementine software suite (www.spss.com/clementine) provides an example of a high-level application that performs enterprise-wide “predictive analytics”, which SPSS defines as including: 1) analysis of past, present, and projected future outcomes using a range of technologies including data mining and related technologies, and 2) decision optimization algorithms for determining which actions will drive the optimal outcomes. An example of Claritas PRIZM-NE classification illustrates the manner in which data mining can be used in business and social applications. The Claritas “Urban Uptown” class (one of the 66 major PRIZM-NE classes reference above) is defined as being “...home to the nation's wealthiest urban consumers. Members of this social group tend to be affluent to middle class, college educated and ethnically diverse, with above-average concentrations of Asian and Hispanic Americans. Although this group is diverse in terms of housing styles and family sizes, residents share an upscale urban perspective that's reflected in their marketplace choices. Urban Uptown consumers tend to frequent the arts, shop at exclusive retailers, drive luxury imports, travel abroad and spend heavily on computer and wireless technology”. One of the five groups within this Claritas class is the “Young Digerati”, described as “the nation's tech–savvy singles and couples living in fashionable neighborhoods on the urban fringe. Affluent, highly educated and ethnically mixed, Young Digerati communities are typically filled with trendy apartments and condos, fitness clubs and clothing boutiques, casual restaurants and all types of bars – from juice to coffee to microbrew.” Clearly, this type of characterization of geographic areas is quite useful for applications such as new store site selection. Examples such as these illustrate that the technologies referenced here are well-suited to and wellestablished in social and business fields 3 . It is interesting to note that while the statistical approaches developed initially for the shop floor are now receiving significant attention in office environments [19,20], technology flow in the opposite direction (business application to shop floor) is occurring for the approaches focused on here. KDAM Applications Successful applications of KDAM technology do exist [21-28]. Common applications includes: Detection of root causes of deteriorating product
منابع مشابه
A data mining approach to employee turnover prediction (case study: Arak automotive parts manufacturing)
Training and adaption of employees are time and money consuming. Employees’ turnover can be predicted by their organizational and personal historical data in order to reduce probable loss of organizations. Prediction methods are highly related to human resource management to obtain patterns by historical data. This article implements knowledge discovery steps on real data of a manufacturing pla...
متن کاملDesigning an Ontology for Knowledge Discovery in Iran’s Vaccine
Ontology is a requirement engineering product and the key to knowledge discovery. It includes the terminology to describe a set of facts, assumptions, and relations with which the detailed meanings of vocabularies among communities can be determined. This is a qualitative content analysis research. This study has made use of ontology for the first time to discover the knowledge of vaccine in Ir...
متن کاملProcess Planning Knowledge Discovery Based on CAPP Database for Mechanical Manufacturing Enterprise
Knowledge discovery in database have been attracting a significant amount of research, industry attention in recent years. Process planning knowledge (PPK) is one of the most important knowledge in mechanical manufacturing enterprise. The traditional method of turning data into knowledge relies on manual analysis and interpretation. This paper analyzes the source and composing of process planni...
متن کاملEditorial for the special issue of knowledge discovery and management in engineering design and manufacturing
Knowledge discovery and management using various advancedtechniques,e.g.data\text\web\multimediamining, computational neuroscience, ontology, and corporate search engine, inengineeringdesignandmanufacturinghasemerged as a very new research area. Engineering design and manufacturing professionals are eager to enhance their information processing and knowledge management capabilities by adopting ...
متن کاملCausal Discovery for Manufacturing Domains
Increasing yield and improving quality are of paramount importance to any manufacturing company. One of the ways to achieve this is through discovery of the causal factors that affect these quantities. In this work, we use data-driven causal models to identify causal relationships in manufacturing. Specifically, we apply causal structure learning techniques on real data collected from a product...
متن کاملبررسی کاربردهای داده کاوی در نظام سلامت
Introduction: Extensive amounts of data stored in medical databases require the development of specialized tools for accessing the data, data analysis, knowledge discovery, and the effective use of the data. Data mining is one of the most important methods. The article sketches the used Data Mining techniques, and illustrates their applicability to medical diagnostic and prognostic problems. ...
متن کامل